Skip to content

fixing beginTransaction() thread safety #6425

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
May 9, 2022
Merged

Conversation

smarq8
Copy link
Contributor

@smarq8 smarq8 commented Mar 13, 2022

according to #6404 here is mentioned PR.
TLTR: isnide beginTransaction() code above spiTransaction() is not protected by mutex and in some situation it might be unexpectedly overwrited by other threads. More descrition in link above.

@CLAassistant
Copy link

CLAassistant commented Mar 13, 2022

CLA assistant check
All committers have signed the CLA.

@me-no-dev
Copy link
Member

My issue here is that the C interface (hal) is no longer safe. Let's find a way for this to work properly on all interfaces :)

@@ -153,6 +154,7 @@ void SPIClass::endTransaction()
if(_inTransaction){
_inTransaction = false;
spiEndTransaction(_spi);
spiUnlock(_spi);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible bug. spiEndTransaction() is the same as calling spiUnlock(), so you are calling unlock twice.

@smarq8
Copy link
Contributor Author

smarq8 commented Mar 14, 2022

What about now? Now I add additional mutex for any r/w _freq and _div except begin() section
below screnn before and after change - there are display,2x sensors 1khz and touch screen controller on same SPI bus. On screen plottet raw data from one of the sensors.
screenshot 1647272673

PS. sorry for oftet changes, Im not yet familiar with git, PR's and its proper workflow.

@@ -1059,7 +1059,7 @@ void spiSimpleTransaction(spi_t * spi)
if(!spi) {
return;
}
SPI_MUTEX_LOCK();
SPI_MUTEX_UNLOCK();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible bug _LOCK() changed to _UNLOCK(). I will do a more in depth study of the code this weekend.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be LOCK as that is another way to start transaction

@me-no-dev
Copy link
Member

I'll do more thinking and looking back at the code. A good test sketch is also in order :)

@@ -106,19 +120,23 @@ void SPIClass::setHwCs(bool use)

void SPIClass::setFrequency(uint32_t freq)
{
//check if last freq changed
SPI_PARAM_LOCK();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Protects "_div" from cross-thread modification. OK

}

void SPIClass::setClockDivider(uint32_t clockDiv)
{
_div = clockDiv;
SPI_PARAM_LOCK();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Protects "_div" from cross-thread modification. OK

@@ -138,7 +156,8 @@ void SPIClass::setBitOrder(uint8_t bitOrder)

void SPIClass::beginTransaction(SPISettings settings)
{
//check if last freq changed
SPI_PARAM_LOCK();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Protects "_div" from cross-thread modification. OK

@mrengineer7777
Copy link
Collaborator

@smarq8 Apologies for the delay. After studying this code for a couple hours, I agree there is an issue in SPIClass::beginTransaction(). esp32-hal-spi.c does include a Mutex for controlling SPI write access and at least some config changes. It appears to be ok. It's SPI.c/h that may not be thread safe due to overwriting local variables in SPI.c.

Mutex lifecycle: xSemaphoreCreateMutex and vSemaphoreDelete should be in the class constructor/destructors. See https://github.com/espressif/arduino-esp32/blob/master/cores/esp32/HardwareSerial.cpp.

Also you introduced a bug into void spiSimpleTransaction(). Needs to be SPI_MUTEX_LOCK().

Warnings:

Possible leak in esp32-hal-spi.c:
spiStartBus() creates a Mutex, but there is no corresponding vSemaphoreDelete
I suggest it should be in spiStopBus() after all other code.

void SPIClass::beginTransaction(SPISettings settings)
{
    <--NEED "if(_inTransaction)....wait or return error???" or Mutex here
    //check if last freq changed
    uint32_t cdiv = spiGetClockDiv(_spi);
    if(_freq != settings._clock || _div != cdiv) {
        _freq = settings._clock;
        _div = spiFrequencyToClockDiv(_freq);
    }
    spiTransaction(_spi, _div, settings._dataMode, settings._bitOrder);
    _inTransaction = true;	<--THIS should probably be before spiTransaction
}

void SPIClass::endTransaction()
{
    if(_inTransaction){
        _inTransaction = false;  <--THIS should probably be after spiEndTransaction
        spiEndTransaction(_spi);
    }
}

@smarq8
Copy link
Contributor Author

smarq8 commented Mar 29, 2022

about _inTransaction I deduce its intention was to be placed after SPI_MUTEX_LOCK() and release after SPI_MUTEX_UNLOCK() so it might be right. However I'm little sceptical here because Im not sure what happen when another task call SPI_MUTEX_LOCK() in between spiTransaction() and SPI_MUTEX_LOCK() especially in dual core env. On the other hand _inTransaction is guarded by spi->lock mutex then again, it should be safe. Im just thinking aloud.

@mrengineer7777 you also mentioned about missing vSemaphoreDelete(paramLock), then what about spi->lock destructor? Is it also missing?

As well Im woder is it proper to calling vSemaphoreDelete(mutex) in situation where mutex==NULL?

@mrengineer7777
Copy link
Collaborator

about _inTransaction I deduce its intention was to be placed after SPI_MUTEX_LOCK() and release after SPI_MUTEX_UNLOCK() so it might be right. However I'm little sceptical here because Im not sure what happen when another task call SPI_MUTEX_LOCK() in between spiTransaction() and SPI_MUTEX_LOCK() especially in dual core env. On the other hand _inTransaction is guarded by spi->lock mutex then again, it should be safe. Im just thinking aloud.

I'm going to scope this to SPIClass::beginTransaction and endTransaction(). To be honest I'm not 100% on the positioning of _inTransaction. I'll have to defer to the experts. @me-no-dev ?

@mrengineer7777 you also mentioned about missing vSemaphoreDelete(paramLock), then what about spi->lock destructor? Is it also missing?

Do you mean this comment? Yes that's likely a bug that should be fixed.

Possible leak in esp32-hal-spi.c: spiStartBus() creates a Mutex, but there is no corresponding vSemaphoreDelete. I suggest it should be in spiStopBus() after all other code.

No harm in adding SPIClass::~SPIClass(), but adding a call to end() in the destructor does change the library behavior. It may be worth leaving that out and starting a separate PR for that. Good catch.

As well Im woder is it proper to calling vSemaphoreDelete(mutex) in situation where mutex==NULL?

That would be bad. Always check that mutex is valid before deleting. I recommend following the pattern in https://github.com/espressif/arduino-esp32/blob/master/cores/esp32/HardwareSerial.cpp. That implementation appears correct. Also I recommend moving vSemaphoreDelete() to the destructor. For extra safety you can set paramLock=NULL after deleting it.

@smarq8
Copy link
Contributor Author

smarq8 commented Mar 29, 2022

No harm in adding SPIClass::~SPIClass(), but adding a call to end() in the destructor does change the library behavior. It may be worth leaving that out and starting a separate PR for that. Good catch.

As well Im woder is it proper to calling vSemaphoreDelete(mutex) in situation where mutex==NULL?

That would be bad. Always check that mutex is valid before deleting. I recommend following the pattern in https://github.com/espressif/arduino-esp32/blob/master/cores/esp32/HardwareSerial.cpp. That implementation appears correct. Also I recommend moving vSemaphoreDelete() to the destructor. For extra safety you can set paramLock=NULL after deleting it.

Why end() at ~SPIClass() matter? Or you reffer to calling vSemaphoreDelete() inside end() where after you mention that I catch its not proper place because after first end() then it will destroy permanently paramLock unintentionally.
Now I tryed to make it as close to HardwareSerial as possible.

Im also not quite sure about at SPI.h.

#if !CONFIG_DISABLE_HAL_LOCKS
#include "FreeRTOS.h"
#endif

@mrengineer7777
Copy link
Collaborator

Why end() at ~SPIClass() matter? Or you reffer to calling vSemaphoreDelete() inside end() where after you mention that I catch its not proper place because after first end() then it will destroy permanently paramLock unintentionally. Now I tryed to make it as close to HardwareSerial as possible.

end() in ~SPIClass() matters because it changes the SPI shutdown. Is it the correct way to do it? VERY LIKELY. But it's outside the scope of your PR and might introduce unexpected side effects.
You constructors and destructors look great.

Im also not quite sure about at SPI.h.

#if !CONFIG_DISABLE_HAL_LOCKS
#include "FreeRTOS.h"
#endif

FreeRTOS is included in lots of places, so should be safe to include here all the time (no #if required). I would do the same as in https://github.com/espressif/arduino-esp32/blob/master/cores/esp32/HardwareSerial.h
#include "freertos/FreeRTOS.h" Put it after #include "esp32-hal-spi.h".

Copy link
Collaborator

@mrengineer7777 mrengineer7777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible weird bug: In another file the Github compiler doesn't like it if the c++ variables declaration are in a different order than the constructor. I would move paramLock below _inTransaction in SPI.h.

Edit: This is the PR with the variable declaration order issue: #6492

@smarq8
Copy link
Contributor Author

smarq8 commented Mar 29, 2022

whats wrong with Arduino 11 on ubuntu-latest ?
it still complain about SPIClass destructor

/home/runner/Arduino/hardware/espressif/esp32/libraries/SPI/src/SPI.cpp: In constructor 'SPIClass::SPIClass(uint8_t)':
/home/runner/Arduino/hardware/espressif/esp32/libraries/SPI/src/SPI.cpp:58:21: error: no matching function for call to 'SPIClass::~SPIClass()'
 SPIClass::~SPIClass()
                     ^
In file included from /home/runner/Arduino/hardware/espressif/esp32/libraries/SPI/src/SPI.cpp:22:
/home/runner/Arduino/hardware/espressif/esp32/libraries/SPI/src/SPI.h:62:5: note: candidate: 'SPIClass::~SPIClass()'
     ~SPIClass();

@mrengineer7777
Copy link
Collaborator

whats wrong with Arduino 11 on ubuntu-latest ?
it still complain about SPIClass destructor

Looks like you fixed it with the missing parenthesis. Looking good.

@smarq8
Copy link
Contributor Author

smarq8 commented Mar 29, 2022

Until now I write everything on spec due to being unable to compile on my PC. After I line up my compiler then its now much easier.
PS. Is it ok to send so frequently commits or should I avoid this kind of situations?

@mrengineer7777
Copy link
Collaborator

Until now I write everything on spec due to being unable to compile on my PC. After I line up my compiler then its now much easier.
PS. Is it ok to send so frequently commits or should I avoid this kind of situations?

It's probably a bit much, but I'm not a maintainer. They'll likely ask you to flatten it down to one commit before merging.

@smarq8
Copy link
Contributor Author

smarq8 commented Mar 30, 2022

I eventually made some working example. It comapre transactions clock divider to expected and print message when not matched.
It is writed under VSCode+platformIO. My testing board is TTGO T-DISPLAY
env:FAIL is current master and env:FIXED is presented PR.
I tried to avoid disabling watchdig by driving task by task notifications from hw timers but without success.

// main.cpp
#include <Arduino.h>
#include <SPI.h>
#include <esp_task_wdt.h>

void task1(void *p){
  SPISettings s1(40000000,MSBFIRST,SPI_MODE0); // cd = 4097
  while(1){
    SPI.beginTransaction(s1);
    uint32_t cd = SPI.getClockDivider();
    SPI.endTransaction();
    if(cd!=4097){
      Serial.printf("task1 SPI clock wverwritten: %8u %u\n",millis(),cd);
    }
  }
  vTaskDelete(NULL);
}
void task2(void *p){
  SPISettings s2(1000000,MSBFIRST,SPI_MODE0); // cd = 10227713
  while(1){
    SPI.beginTransaction(s2);
    uint32_t cd = SPI.getClockDivider();
    SPI.endTransaction();
    if(cd!=10227713){
      Serial.printf("task2 SPI clock wverwritten: %8u %u\n",millis(),cd);
    }
  }
  vTaskDelete(NULL);
}

void setup() {
  // here is no delay in tasks so watchdog must be disabled
  esp_task_wdt_init(100000, false);
  Serial.begin(115200);
  Serial.printf("\n--->START<---\n");
  SPI.begin();
  // multiple task per core mandatory to reveal issue
  // othervise it might take so long before any fail
  xTaskCreatePinnedToCore(task1,"task1",4096,NULL,5,NULL,0);
  xTaskCreatePinnedToCore(task2,"task2",4096,NULL,5,NULL,1);
  xTaskCreatePinnedToCore(task1,"task1",4096,NULL,5,NULL,0);
  xTaskCreatePinnedToCore(task2,"task2",4096,NULL,5,NULL,1);
  xTaskCreatePinnedToCore(task1,"task1",4096,NULL,5,NULL,0);
  xTaskCreatePinnedToCore(task2,"task2",4096,NULL,5,NULL,1);
  xTaskCreatePinnedToCore(task1,"task1",4096,NULL,5,NULL,0);
  xTaskCreatePinnedToCore(task2,"task2",4096,NULL,5,NULL,1);
  xTaskCreatePinnedToCore(task1,"task1",4096,NULL,5,NULL,0);
  xTaskCreatePinnedToCore(task2,"task2",4096,NULL,5,NULL,1);
  xTaskCreatePinnedToCore(task1,"task1",4096,NULL,5,NULL,0);
  xTaskCreatePinnedToCore(task2,"task2",4096,NULL,5,NULL,1);
}

void loop() {
}
; platformio.ini
[env:FAIL]
platform = https://github.com/platformio/platform-espressif32.git#feature/arduino-upstream
platform_packages = framework-arduinoespressif32 @ https://github.com/espressif/arduino-esp32#master
board = esp32dev 
framework = arduino
monitor_speed = 115200 

[env:FIXED]
platform = https://github.com/platformio/platform-espressif32.git#feature/arduino-upstream
platform_packages = framework-arduinoespressif32 @ https://github.com/smarq8/arduino-esp32#master
board = esp32dev 
framework = arduino
monitor_speed = 115200

output in FAIL env

task2 SPI clock wverwritten:    40418 4097
task2 SPI clock wverwritten:    40458 4097
task2 SPI clock wverwritten:    40668 4097
task2 SPI clock wverwritten:    40933 4097
task2 SPI clock wverwritten:    41035 4097
task2 SPI clock wverwritten:    41377 4097
task1 SPI clock wverwritten:    41438 10227713
task2 SPI clock wverwritten:    41690 4097
task2 SPI clock wverwritten:    41773 4097
task2 SPI clock wverwritten:    41921 4097
task2 SPI clock wverwritten:    42091 4097
task2 SPI clock wverwritten:    42174 4097
task2 SPI clock wverwritten:    42200 4097
task2 SPI clock wverwritten:    42324 4097
task2 SPI clock wverwritten:    42393 4097
task2 SPI clock wverwritten:    42447 4097
task2 SPI clock wverwritten:    42614 4097
task2 SPI clock wverwritten:    42709 4097
task2 SPI clock wverwritten:    42748 4097
task1 SPI clock wverwritten:    42826 10227713
task1 SPI clock wverwritten:    43062 10227713
task2 SPI clock wverwritten:    43617 4097
task2 SPI clock wverwritten:    43713 4097
task2 SPI clock wverwritten:    43997 4097
task2 SPI clock wverwritten:    44227 4097
task2 SPI clock wverwritten:    44269 4097
task2 SPI clock wverwritten:    44372 4097
task2 SPI clock wverwritten:    44386 4097
task2 SPI clock wverwritten:    44522 4097
task2 SPI clock wverwritten:    44630 4097
task2 SPI clock wverwritten:    44766 4097
task2 SPI clock wverwritten:    44810 4097

@VojtechBartoska VojtechBartoska added this to the 2.0.4 milestone May 4, 2022
@VojtechBartoska
Copy link
Contributor

Status update: added to 2.0.4 milestone, we will take a look on this in next 2 weeks

@me-no-dev
Copy link
Member

Thanks all for the contribution :) Merging

@me-no-dev me-no-dev merged commit bdeef89 into espressif:master May 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

5 participants